Generalized Compression Dictionary Distance as Universal Similarity Measure

نویسندگان

  • Andrey Bogomolov
  • Bruno Lepri
  • Fabio Pianesi
چکیده

We easily agree with the following outcome of 20 trials of fair coin toss “01111011001101110001”, but we do not accept the result “00000000000000000000”. However, both results have equal chances given that the fair coin model assumption holds. This is a common example of paradoxes in probability theory, but our reaction is caused by the belief that the first sequence is complicated, but the second is simple [2]. A second example of human-inspired limitations is “Green Lumber Fallacy” introduced by Nassim Nicholas Taleb. It is a kind of fallacy that a person ”mistaking the source of important or even necessary knowledge, for another less visible from the outside, less tractable one”. Mathematically, it could be expressed as we use an incorrect function which, by some chance, returns the correct output, such that g(x) is mixed with f(x). The root of the fallacy is that “although people may be focusing on the right things, due to complexity of the thing, are not good enough to figure it out intellectually” [1].

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An improved similarity measure of generalized trapezoidal fuzzy numbers and its application in multi-attribute group decision making

Generalized trapezoidal fuzzy numbers (GTFNs) have been widely applied in uncertain decision-making problems. The similarity between GTFNs plays an important part in solving such problems, while there are some limitations in existing similarity measure methods. Thus, based on the cosine similarity, a novel similarity measure of GTFNs is developed which is combined with the concepts of geometric...

متن کامل

ar X iv : c s / 03 12 04 4 v 1 [ cs . C V ] 1 9 D ec 2 00 3 Clustering by Compression

We present a new method for clustering based on compression. The method doesn't use subject-specific features or background knowledge, and works as follows: First, we determine a universal similarity distance, the normalized compression distance or NCD, computed from the lengths of compressed data files (singly and in pairwise concatenation). Second, we apply a hierarchical clustering method. T...

متن کامل

Kernelized Rényi distance for subset selection and similarity scoring

Rényi entropy refers to a generalized class of entropies that have been used in several applications. In this work, we derive a non-parametric distance between distributions based on the quadratic Rényi entropy. The distributions are estimated via Parzen density estimates. The quadratic complexity of the distance evaluation is mitigated with GPUbased parallelization. This results in an efficien...

متن کامل

1 4 Fe b 20 14 Authorship Analysis based on Data Compression

6 This paper proposes to perform authorship analysis using the Fast Compression Distance (FCD), a similarity measure based on compression with dictionaries directly extracted from the written texts. The FCD computes a similarity between two documents through an effective binary search on the intersection set between the two related dictionaries. In the reported experiments the proposed method i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1410.5792  شماره 

صفحات  -

تاریخ انتشار 2014